## CS152: Computer Systems Architecture RISC-V Assembly, x86 Assembly (And Encoding)



Sang-Woo Jun Fall 2023



Large amount of material adapted from MIT 6.004, "Computation Structures", Morgan Kaufmann "Computer Organization and Design: The Hardware/Software Interface: RISC-V Edition", and CS 152 Slides by Isaac Scherson

# What does an ISA encoding look like?

ADD: 0x00000001, SUB: 0x0000002, LW: 0x00000003, SW: 0x0000004, ...?

Haphazard encoding makes processor design complicated!

• More chip resources, more power consumption, less performance

# **RISC/CISC** decisions



In what way is an ISA "simpler" or "complex"? And how will it effect hardware design/performance?

# **The Important Points**

□ How much work does each instruction do?

□ RISC (RISC-V) cleanly divides instructions into three categories

- 1. Computational operation: from register file to register file
- 2. Load/Store: between memory and register file
- 3. Control flow: jump to different part of code

| 31         | 27               | 26  | 25  | 24      | 20       | 19    | 15          | 14    | 12   | 11      | 7       | 6   | 0    | _        |
|------------|------------------|-----|-----|---------|----------|-------|-------------|-------|------|---------|---------|-----|------|----------|
| fun        |                  |     |     |         | 52       | rs1   |             | fun   | ct3  | rd      |         | оро | code | R-type   |
|            |                  |     | 1:0 | ]       |          | rs    | 1           | fun   | ct3  | r       | -       | оро | code | l-type   |
| imm[       |                  |     |     | n       | 52       | rs    | 1           | fun   | ct3  |         | [4:0]   | оро | code | S-type   |
| imm[12     | 2 10:            | 5]  |     |         | 52       | rs    | 1           | fun   | ct3  | imm[4   | k:1 11] | оро | code | B-type   |
| imm[31:12] |                  |     |     |         |          | r     | d           | оро   | code | ] U-typ |         |     |      |          |
|            |                  | i   | mm  | n[20 10 | :1 11 19 | 9:12] |             |       |      | r       | d       | оро | code | ] J-type |
|            |                  |     |     |         | -        | -     |             |       |      |         |         |     |      |          |
|            |                  |     |     |         |          | Base  | Instru      | ction | Set  |         |         | 011 |      |          |
|            |                  |     |     |         | 31:12]   |       |             |       |      |         | d       | -   | 0111 | LUI      |
|            |                  |     |     |         | 31:12]   |       |             |       |      |         | d       |     | 0111 | AUIPO    |
|            |                  |     |     |         | :1 11 19 | 9:12] |             |       |      | r       | d       | 110 | 1111 | JAL      |
|            | im               | m[1 | 1:0 | ]       |          | rs    | 1           | 00    | 0    | r       | **      | 110 | 0111 | JALR     |
| imm[12     | 2 10:!           | 5]  |     | n       | 52       | rs    | 1           | 00    | 00   | imm[4   |         |     | 0011 | BEQ      |
| imm[12     | 2 10:!           | 5]  |     | n       | 52       | rs    | 1           | 00    | )1   | imm[4   | k:1 11] | 110 | 0011 | BNE      |
| imm[12     | 2 10:!           | 5]  |     | n       | 52       | rs    | 1           | 10    | 00   | imm[4   | 1:1 11] | 110 | 0011 | BLT      |
| imm[12     | 2 10:            | 5]  |     | n       | 52       | rs    | 1           | 10    | )1   | imm[4   | k:1 11] | 110 | 0011 | BGE      |
| imm[12     | 2 10:!           | 5]  |     | n       | 52       | rs    | 1           | 11    | .0   | imm[4   | 1:1 11] | 110 | 0011 | BLTU     |
| imm[12     | imm[12 10:5] rs2 |     | rs1 |         | 11       | .1    | imm[4:1 11] |       | 110  | 0011    | BGEU    |     |      |          |
|            | im               | m[1 | 1:0 | ]       |          | rs    | 1           | 00    | 00   | r       | d       | 000 | 0011 | LB       |
|            | im               | m[1 | 1:0 | ]       |          | rs    | 1           | 00    | )1   | r       | d       | 000 | 0011 | LH       |
|            |                  |     |     |         |          |       |             |       |      |         |         |     |      |          |

010

100

101

000

001

010

000

010

011

100

110

111

001

101

101

000

000

001

010

011

100

101

101

110

111

000

000

000

000

000

rd

rd

rd

imm[4:0]

imm[4:0]

imm[4:0]

rd

00000

00000

00000

00000

rs1

00000

00000

00000

00000

imm[11:0

imm[11:0]

imm[11:0

imm[11:0]

imm[11:0]

imm[11:0

imm[11:0

imm[11:0

imm[11:0

rs2

rs2

rs2

shamt

shamt

shamt

rs2

succ

0011

0000

pred

0011

0001

000000000001

imm[11:5]

imm[11:5]

imm[11:5]

0000000

0000000

0100000

0000000

0100000

0000000

0000000

0000000

0000000

0000000

0100000

0000000

0000000

fm

1000

0000

LW

LBU

LHU

SB

SH

SW

ADDI

SLTI

SLTIU

XORI

ANDI

SLLI

SRLI

SRAI

ADD

SUB

SLL

SLT

SLTU

XOR

SRL

SRA

OR

AND

FENCE

PAUSE

ECALL

EBREAK

FENCE.TSO

ORI

0000011

0000011

0000011

0100011

0100011

0100011

0010011

0010011

0010011

0010011

0010011

0010011

0010011

0010011

0010011

0110011

0110011

0110011

0110011

0110011

0110011

0110011

0110011

0110011

0110011

0001111

0001111

0001111

1110011

1110011

## This is every instruction in RISC-V base ISA (RV32I)

# **RISC-V** instruction encoding

## Restrictions

- $\circ$  4 bytes per instruction
- Different instructions have different parameters (registers, immediates, ...)
- $\circ~$  Various fields should be encoded to consistent locations
  - Simpler decoding circuitry

## □ Answer: RISC-V uses 6 "types" of instruction encoding

| Name         |    | Field                       |        |        |        |               |                           | Comments                      |  |
|--------------|----|-----------------------------|--------|--------|--------|---------------|---------------------------|-------------------------------|--|
| (Field Size) |    | 7 bits                      | 5 bits | 5 bits | 3 bits | 5 bits        | 7 bits                    |                               |  |
| R-type       |    | funct7                      | rs2    | rs1    | funct3 | rd            | opcode                    | Arithmetic instruction format |  |
| I-type       |    | immediate[11:0]             |        | rs1    | funct3 | rd            | opcode                    | Loads & immediate arithmetic  |  |
| S-type       | i  | nmed[11:5]                  | rs2    | rs1    | funct3 | immed[4:0]    | opcode                    | Stores                        |  |
| SB-type      | im | med[12,10:5]                | rs2    | rs1    | funct3 | immed[4:1,11] | opcode                    | Conditional branch format     |  |
| UJ-type      |    | immediate[20,10:1,11,19:12] |        |        | rd     | opcode        | Unconditional jump format |                               |  |
| U-type       |    | immediate[31:12]            |        |        |        | rd            | opcode                    | Upper immediate format        |  |

#### Small number of types

#### We're not going to look at everything...

# 1/6: RISC-V R-Type encoding

- □ Relatively straightforward, register-register operations encoding
- **Remember**:
  - o if ( inst.type == ALU ) rf[inst.arg1] = alu(inst.op, rf[inst.arg2], rf[inst.arg3])
  - In 4 bytes, type, arg1, arg2, arg3, op needs to be encoded

| 31 25   | 24 20                 | 19                    | 15 14 12                    | -11 7                 | 6      | 0 |
|---------|-----------------------|-----------------------|-----------------------------|-----------------------|--------|---|
| funct7  | rs2                   | rs1                   | funct3                      | $\mathbf{rd}$         | opcode |   |
| 7       | 5                     | 5                     | 3                           | 5                     | 7      |   |
| 0000000 | $\operatorname{src2}$ | $\operatorname{src1}$ | ADD/SLT/SLTU                | U dest                | OP     |   |
| 0000000 | $\operatorname{src2}$ | $\operatorname{src1}$ | AND/OR/XOR                  | $\operatorname{dest}$ | OP     |   |
| 0000000 | $\operatorname{src2}$ | $\operatorname{src1}$ | $\mathrm{SLL}/\mathrm{SRL}$ | $\operatorname{dest}$ | OP     |   |
| 0100000 | $\operatorname{src2}$ | $\operatorname{src1}$ | SUB/SRA                     | $\operatorname{dest}$ | OP     |   |

# 1/6: R-Type Computational operations

- □ Arithmetic, comparison, logical, shift operations
- Register-register instructions
  - 2 source operand registers
  - $\circ$  1 destination register
  - Format: op dst, src1, src2

| Arithmetic                                         | Comparison | Logical                                                  | Shift         |  |  |  |
|----------------------------------------------------|------------|----------------------------------------------------------|---------------|--|--|--|
| add, sub                                           | slt, sltu  | and, or, xor                                             | sll, srl, sra |  |  |  |
| set less tha<br>set less than unsigne<br>Signed/un | ed         | Shift left log<br>Shift right log<br>Shift right arithme |               |  |  |  |
| <b>°</b>                                           | <b>.</b>   | Arithmetic/logical?                                      |               |  |  |  |

# 2/6: I-Type Encoding

□ Some instructions need "immediate" values

- e.g., "addi x1, x2, 32" <- 32 is an immediate value encoded in the instruction
- $\circ~$  R-Type does not have slots for this

# 2/6 RISC-V I-Type encoding

## Register-Immediate operations encoding

 $\circ~$  One register, one immediate as input, one register as output

Operands in same location!

| 31  | 20                           | 19      | 15                   | o 14     |                        | 12  | 11                    | 76 |        | 0 |
|-----|------------------------------|---------|----------------------|----------|------------------------|-----|-----------------------|----|--------|---|
|     | imm[11:0]                    |         | rs1                  |          | funct3                 |     | rd                    |    | opcode |   |
|     | 12                           |         | 5                    |          | 3                      |     | 5                     |    | 7      |   |
|     | I-immediate[11:0]            |         | src                  | ADI      | DI/SLTI                | [U] | $\operatorname{dest}$ |    | OP-IMM |   |
|     | I-immediate[11:0]            |         | $\operatorname{src}$ | AND      | I/ORI/                 | XOI | KORI dest             |    | OP-IMM |   |
| 31  |                              | 20 19   |                      | $15 \ 1$ | 4 12                   | 11  | ,                     | 76 |        | 0 |
|     | $\operatorname{imm}[11:0]$   |         | rs1                  |          | funct3                 |     | rd                    |    | opcode |   |
|     | 12                           |         | 5                    |          | 3                      |     | 5                     |    | 7      |   |
|     | offset[11:0]                 |         | base                 |          | 0                      |     | $\operatorname{dest}$ |    | JALR   |   |
| 31  |                              | 20 19   |                      | $15 \ 1$ | 4 12                   | 11  | 7                     | 6  |        | 0 |
|     | $\operatorname{imm}[11:0]$   |         | rs1                  | f        | unct3                  |     | rd                    |    | opcode |   |
|     | 12                           |         | 5                    |          | 3                      |     | 5                     | -  | 7      |   |
|     | offset[11:0]                 |         | base                 | ۲        | $\operatorname{vidth}$ |     | $\operatorname{dest}$ |    | LOAD   |   |
| Imm | ediate value limited to 12 b | its sig | gned!                |          |                        |     |                       |    |        |   |

addi x5, x6, 2048 # Error: illegal operands `addi x5,x6,2048'

# 2/6: I-Type Computational operations

## Register-immediate operations

- $\circ$  2 source operands
  - One register read
  - One immediate value encoded in the instruction Limited to 12 bits! (Why?)
- $\circ$  1 destination register
- Format: op dst, src, imm
  - eg., addi x1, x2, 10

| Format                 | Arithmetic | Comparison  | Logical         | Shift            |
|------------------------|------------|-------------|-----------------|------------------|
| register-<br>register  | add, sub   | slt, sltu   | and, or, xor    | sll, srl, sra    |
| register-<br>immediate | addi       | slti, sltiu | andi, ori, xori | slli, srli, srai |

No "subi" instead use negative with "addi"

# 3/6: RISC-V Load/Store operations

### □ Format: op dst, offset(base)

- Address specified by a pair of <base address, offset>
- $\circ$  e.g., lw x1, 4(x2) # Load a word (4 bytes) from [x2]+4 to x1
- $\circ~$  The offset is a small constant

## Variants for types

- lw/sw: Word (4 bytes)
- Ih/Ihu/sh: Half (2 bytes)
- Ib/lbu/sb: Byte (1 byte)
- $\circ~$  'u' variant is for unsigned loads
  - Half and Byte reads extends read data to 32 bits. Signed loads are sign-bit aware

# 3/6: S-Type encoding

□ Store operation: two register input, no output

e.g.,sw src, offset(base)



# 4/6: RISC-V Control flow instructions -Branching

- □ Format: cond src1, src2, label
- □ If condition is met, jump to label. Otherwise, continue to next

| beq | bne | blt | bge | bltu | bgeu |
|-----|-----|-----|-----|------|------|
| ==  | !=  | <   | >=  | <    | >=   |



bge x1, x2, else addi x3, x1, 1 beq x0, x0, end else: addi x3, x2, 2 end:

(Assume x1=a; x2=b; x3=c;)

# 4.6: SB-Type encoding

Store operation: two register input, no output



Only 12 bits of offset can fit! -> Jump target can be max 2^12 bits away

# 5/6: RISC-V Control flow instructions – Jump and Link

## Format:

- jal dst, label Jump to 'label', store PC+4 in dst
- jalr dst, offset(base) Jump to rf[base]+offset, store PC+4 in dst
  - e.g., jalr x1, 4(x5) Jumps to x5+4, stores PC+4 in x1
- □ Why do we need two variants?
  - $\circ~$  jal has a limit on how far it can jump
    - (Due to immediate value encoding width, shown soon)
  - $\circ~$  jalr used to jump to locations defined at runtime
    - Needed for many things including function calls (e.g., Many callers calling one function)



# 5/6: UJ-Type encoding

One destination register, one immediate operand

• UB-Type: JAL (Jump and link)



Only 20 bits of offset! What if target is farther?

# 5/6: RISC-V Relative addressing

## □ Problem: jump target offset is small!

- For branches: 12 bits, For JAL: 20 bits
- $\circ~$  How does it deal with larger program spaces?
- Solution: PC-relative addressing (PC = PC + imm)
  - Remember format: beq x5, x6, label
  - Translation from label to offset done by assembler
  - Works fine if branch target is nearby. If not, AUIPC and other tricks by assembler



# 6/6: Load upper immediate instructions

## □ LUI: Load upper immediate

- $\circ$  lui dst, immediate  $\rightarrow$  dst = immediate<<12
- Can load (32-12 = 20) bits
- Used to load large (~32 bits) immediate values to registers
- $\circ$  lui followed by addi (load 12 bits) to load 32 bits
- □ AUIPC: Add upper immediate to PC
  - $\circ$  auipc, dst, immediate  $\rightarrow$  dst = PC + immediate<<12
  - Can load (32-12 = 20) bits
  - o auipc followed by addi, then jalr to allow long jumps within any 32 bit address

Typically not used by human programmers! Assemblers use them to implement complex operations

# 6/6: RISC-V U-Type and UJ-Type encoding

One destination register, one immediate operand

- U-Type: LUI (Load upper immediate), AUIPC (Add upper immediate to PC)
   Typically not used by human programmer
- UB-Type: JAL (Jump and link)



## Aside: Why is the immediate field 12 bits?

□ If most immediate values are larger, this instruction is useless!

 $\circ~$  Why not encode more imm, and reduce register count?

|    | 31                         | 20 19                | )     | 15  14 | 1         | 2 11       | 7 6   | 0      |   |
|----|----------------------------|----------------------|-------|--------|-----------|------------|-------|--------|---|
|    | imm[11:0                   | )]                   | rs1   | f      | unct3     | rd         | opcod | le     |   |
|    | 12                         |                      | 5     |        | 3         | 5          | 7     |        |   |
|    | I-immediate                | e[11:0]              | src   | ADD    | I/SLTI[U] | ] dest     | OP-IM | М      |   |
|    | I-immediate                | e[11:0]              | src   | ANDI   | /ORI/XC   | ORI dest   | OP-IM | М      |   |
|    |                            |                      |       |        |           |            |       |        |   |
| 31 | 25                         | 24 20                | 0 19  | 15     | 14 12     | 2 11       | 76    |        | 0 |
|    | $\operatorname{imm}[11:5]$ | rs2                  | rs    | 51     | funct3    | imm[4:0    | )] c  | opcode |   |
|    | 7                          | 5                    |       | 5      | 3         | 5          |       | 7      |   |
|    | offset[11:5]               | $\operatorname{src}$ | ba    | ase    | width     | offset[4:0 | 0] ST | ΓORE   |   |
|    |                            |                      |       |        |           |            |       |        |   |
|    | 31                         |                      | 20 19 | 15     | 14 12     | 11 7       | 6     | 0      |   |
|    | imm[1                      | 1:0]                 | rs    | 51     | funct3    | rd         | opcod | le     |   |
|    | 12                         | 2                    | Ę     | 5      | 3         | 5          | 7     |        |   |
|    | offset[                    | [11:0]               | ba    | ise    | width     | dest       | LOAD  | )      |   |

# Benchmark-driven ISA design



□ Make the common case fast!

12~16 bits capture most cases



"CSCE 51: Lecture 03 Instruction Set Principles,"YonghongYan, University of South Carolina

# RISC-V Design consideration: Consistent operand encoding location

□ Simplifies circuits, resulting in less chip resource usage

| 31 30 25                                          | 24 21      | 20    | 19  | 15 1 | 4     | 12 | 11 8                      | 7                        | 6     | 0         |
|---------------------------------------------------|------------|-------|-----|------|-------|----|---------------------------|--------------------------|-------|-----------|
| funct7                                            | rs2        |       | rs1 |      | funct | 3  | rd                        |                          | opcod | e R-type  |
|                                                   |            |       |     |      |       |    |                           |                          |       |           |
| imm[1                                             | 1:0]       |       | rs1 |      | funct | 3  | rd                        |                          | opcod | e I-type  |
|                                                   |            |       |     |      |       |    |                           |                          |       |           |
| $\operatorname{imm}[11:5]$                        | rs2        |       | rs1 |      | funct | 3  | $\operatorname{imm}[4]$   | 4:0]                     | opcod | e S-type  |
|                                                   |            |       |     |      |       |    |                           |                          |       |           |
| $\operatorname{imm}[12] \operatorname{imm}[10:5]$ | rs2        |       | rs1 |      | funct | 3  | $\operatorname{imm}[4:1]$ | $\operatorname{imm}[11]$ | opcod | e SB-type |
|                                                   |            |       |     |      |       |    |                           |                          |       |           |
|                                                   | imm[31:12] | ]     |     |      |       |    | rd                        |                          | opcod | e U-type  |
|                                                   |            |       |     |      |       |    |                           |                          |       |           |
| $\operatorname{imm}[20]$ $\operatorname{imm}[10]$ | 0:1] imr   | m[11] | imm | [19] | :12]  |    | rd                        |                          | opcod | e UJ-type |

## CS152: Computer Systems Architecture x86 Assembly (And Encoding)



Sang-Woo Jun Fall 2023



Large amount of material adapted from MIT 6.004, "Computation Structures", Morgan Kaufmann "Computer Organization and Design: The Hardware/Software Interface: RISC-V Edition", and CS 152 Slides by Isaac Scherson

# x86 encoding

## Many many complex instructions

- $\circ~$  Fixed-size encoding will waste too much space
- Variable-length encoding!
- $\circ$  1 byte 15 bytes encoding

## □ Complex decoding logic in hardware

- Hardware translates instructions to simpler micro operations
  - Simple instructions: 1–1
  - Complex instructions: 1-many
- $\circ~$  Microengine similar to RISC
- Market share makes this economically viable

Comparable performance to RISC! But with translation overhead Compilers avoid complex instructions



# Meanwhile: x86 – Addressing modes

□ Typical x86 assembly instructions have many addressing mode variants

| Source/dest operand | Second source operand |
|---------------------|-----------------------|
| Register            | Register              |
| Register            | Immediate             |
| Register            | Memory                |
| Memory              | Register              |
| Memory              | Immediate             |

#### • e.g., 'add' has two input operands, storing the add in the second

add <reg>, <reg>
add <mem>, <reg>

- add <reg>, <mem>
- add <imm>, <reg>
- add <imm>, <mem>

```
Examples
add $10, %eax — EAX is set to EAX + 10
addb $10, (%eax) — add 10 to the single byte stored at memory address stored in
EAX
```

CISC! But no "Memory -> Memory"

# CISC ISAs typically mix arithmetic + load/store

- □ Remember x86 "add" example
  - Arithmetic instruction can access memory, store in memory
- □ Some special Load/Store instructions also do exist
  - $\circ~$  e.g., "mov" with same addressing modes
  - e.g., "vmovupd" in AVX extensions...

| Source/dest operand | Second source operand |
|---------------------|-----------------------|
| Register            | Register              |
| Register            | Immediate             |
| Register            | Memory                |
| Memory              | Register              |
| Memory              | Immediate             |

# x86 Complex addressing modes: Complex encoding!

## □ "imul eax, [rdx+rcx\*4-0x4]"

- $\circ~$  Encoded to single instruction "Of af 44 8a fc"
- $\circ~$  Signed multiplication between eax, and a value from memory
- $\circ~$  Two additions and one multiplication before memory request!
  - (Which architectural component is responsible for this arithmetic?)
- One multiplication after memory request comes back

## □ Who performs the memory address arithmetic?

- Separate ALU? Time-share ALU with actual imul operation?
- Microarchitectural details not enforced by ISA

# x86: CISC requires complex encoding!

## □ So many possibilities within a single instruction

- Complex, variable-width data to encode
- Complex, high-latency decode logic unavoidable!



Variable-length: Many fields are optional

→ The location (bit offset) of each field is always changing!

Bristol community college, "CIS-77 Introduction to Computer Systems"

e.g., Immediate values can use

either 0, 1,2,4 bytes to encode

# Aside: Conditional execution in CISC and RISC

# Conditional execution in CISC: Condition codes

cmp <reg>,<reg>

- Implicitly managed bitmap of flags
   e.g., Carry, Overflow, Negative, Equal to zero, less than, ...
   Flags set by previously executed instruction
   e.g., x86 "cmp" compares two values and sets condition code flags
   Usual addressing modes
  - Jump instruction variants read condition code flags

```
je <label> (jump when equal)
jne <label> (jump when not equal)
jz <label> (jump when last result was zero)
jg <label> (jump when greater than)
jge <label> (jump when greater than or equal to)
jl <label> (jump when less than)
jle <label> (jump when less than or equal to)
```

# Conditional execution in CISC: Condition codes

□ Some instructions can execute only if conditions are met

- "Predicated instructions"
- ARM MOVHS (Move higher or same) only moves if previous instruction resulted in "higher or same" flag being set. Otherwise NOP
- Can remove a costly conditional branch instruction if used well
- Carry bits can be useful for large adds, ...

## Predicated instructions in ARM



C Code

Without predicated instructions

With predicated instructions

# **RISC-V** Condition codes

## □ RISC-V does not have condition codes

• Designers wanted simpler communications between pipeline stages

# Wrapping up

## □ Two ends of the spectrum: RISC and CISC

- RISC simplifies processor hardware, but same programs result in more code
- CISC reduces code volume, but complicates processor hardware

□ To reason about this trade-off, we need to know their actual effects

- $\circ$  How much clock speed degradation do we get with more complex decode?
- How much transistor overhead is complex decode?
- $\circ~$  How much instruction count increase caused by RISC ISA?

Up next!

$$CPUTime = \frac{Instructions}{Program} \times \frac{Clock cycles}{Instruction} \times \frac{Seconds}{Clock cycle}$$

# The Important Points

□ RISC-V (RISC) instructions are cleanly divided into categories

- ALU, Branch, Memory Specifically the six encoding types
- $\circ$  Lower encoding density, but simplifies decoding
- □ x86 (CISC) does NOT cleanly divide work into categories
  - o Each instruction can do a combination of ALU, Branch, Memory
  - Higher density, but complicates decoding

# The Important Points

## □ RISC-V (RISC) instructions are fixed-width

- $\circ$  immediate values cannot be encoded in full (32 bits) into one instruction
- o e.g., addi encodes 12 bits, AUIPC encodes 20 bits
- $\circ~$  ISA carefully designed to require only two instructions per 32-bit word
- o (register file being 32, opcode being 7 bits, all balance into this)
- □ x86 (CISC) instructions are variable-width
  - One immediate value can be encoded into one long (4bytes+) instructions
  - Complicates encoding, but fewer instructions

# Aside: Handling I/O

- □ How can a processor communicate with the outside world?
- □ Special instructions? Sometimes!
  - $\circ~$  RISC-V defines CSR (Control and Status Registers) instructions
  - Check processor capability (I/M/E/A/..?), performance counters, system calls, ...
  - o "Port-mapped I/O"
- □ E.g., x86 has "IN", "OUT" instructions
  - $\circ~$  Goes back to how 8080 did I/O
  - "IN \$0x60, %al" reads a keyboard input from the PS/2 controller



Source: Wikipedia

# Aside: Handling I/O

□ For efficient communication, memory-mapped I/O

- $\circ~$  Happens outside the processor
- I/O device directed to monitor CPU address bus, intercepting I/O requests
  - Each device assigned one or more memory regions to monitor
  - Some memory commands handles by memory, some by peripherals!

Example:

In the original Nintendo GameBoy, reading from address 0xFF00 returned a bit mask of currently pressed buttons

Both approaches require one CPU instruction per word I/O...

# Aside: Handling I/O

- Even faster option: DMA (Direct Memory Access)
  - Off-chip DMA Controller can be directed to read/write data from memory without CPU intervention
  - Once DMA transfer is initiated, CPU can continue doing other work
  - Used by high-performance peripherals like PCIe-attached GPUs, NICs, and SSDs
    - Hopefully we will have time to talk about PCIe!
  - Contrast: Memory-mapped I/O requires one CPU instruction for one word of I/O
    - CPU busy, blocking I/O hurts performance for long latency I/O

# Wrapping up...

## Design principles

- 1. Simplicity favors regularity
- 2. Smaller is faster
- 3. Good design demands good compromises
- Make the common case fast

## □ Powerful instruction $\Rightarrow$ higher performance

- Fewer instructions required, but complex instructions are hard to implement
  - May slow down all instructions, including simple ones
- $\circ~$  Compilers are good at making fast code from simple instructions